Search CORE

134 research outputs found

Automatic Extraction of Subcategorization from Corpora

Author: Briscoe Ted
Carroll John
Publication venue
Publication date: 01/01/1997
Field of study

We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9

arXiv.org e-Print Archive

CiteSeerX

Crossref

Sussex Research Online

Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation

Author: Briscoe Ted
Carroll John
Publication venue
Publication date: 01/01/1996
Field of study

We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-of-speech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioritise the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.Comment: 10 pages, 1 Postscript figure. To Appear in Proceedings of the Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania, May 199

arXiv.org e-Print Archive

CiteSeerX

Sussex Research Online

Corpus Annotation for Parser Evaluation

Author: Briscoe Ted
Carroll John
Minnen Guido
Publication venue
Publication date: 01/01/1999
Field of study

We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources.Comment: 7 pages, LaTeX (uses eaclap.sty

arXiv.org e-Print Archive

CiteSeerX

Can Subcategorisation Probabilities Help a Statistical Parser?

Author: Briscoe Ted
Carroll John
Minnen Guido
Publication venue
Publication date: 01/01/1998
Field of study

Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st

arXiv.org e-Print Archive

CiteSeerX

Sussex Research Online

Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences

Author: Briscoe Ted
Diana McCarthy
John Carroll
Li Hang
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2003
Field of study

Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective—noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance

CiteSeerX

Crossref

Sussex Research Online

Recommended from our members

Looking for Hyponyms in Vector Space.

Author: Briscoe Ted
Rei Marek
Publication venue: CoNLL
Publication date: 26/06/2014
Field of study

The task of detecting and generating hyponyms is at the core of semantic understanding of language, and has numerous practical applications. We investigate how neural network embeddings perform on this task, compared to dependency-based vector space models, and evaluate a range of similarity measures on hyponym generation. A new asymmetric similarity measure and a combination approach are described, both of which significantly improve precision. We release three new datasets of lexical vector representations trained on the BNC and our evaluation dataset for hyponym generation

Apollo (Cambridge)